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Detection Theory Basics 


Detection theory concerns making decisions from data. Decisions are based on presumptive models that 
may have produced the data. Making a decision involves inspecting the data and determining which model 
was most likely to have produced them. In this way, we are detecting which model was correct. Decision 
problems pervade signal processing. In digital communications, for example, determining if the current bit 
received in the presence of channel disturbances was a zero or a one is a detection problem. 


More concretely, we denote by .@, the i‘® model that could have generated the data R. A "model" is 
captured by the conditional probability distribution of the data, which is denoted by the vector R. For 
example, model 7 is described by priv, (7). Given all the models that can describe the data, we need to 
choose which model best matched what was observed. The word "best" is key here: what is the optimality 
criterion, and does the detection processing and the decision rule depend heavily on the criterion used? 
Surprisingly, the answer to the second question is "No." All of detection theory revolves around the 
likelihood ratio test, which as we shall see, emerges as the optimal detector under a wide variety of 
optimality criteria. 


The Likelihood Ratio Test 


In a binary detection problem in which we have two models, four possible decision outcomes can result. 
Model .G@ did in fact represent the best model for the data and the decision rule said it was (a correct 
decision) or said it wasn't (an erroneous decision). The other two outcomes arise when model .4/; was in 
fact true with either a correct or incorrect decision made. The decision process operates by segmenting the 
range of observation values into two disjoint decision regions Zp and Z,. All values of F fall into either 
Zo or Z1. If a given R lies in Zo, we will announce our decision ""model . was true""; if in Z,, model 
MM, would be proclaimed. To derive a rational method of deciding which model best describes the 
observations, we need a criterion to assess the quality of the decision process so that optimizing this 
criterion will specify the decision regions. 


The Bayes' decision criterion seeks to minimize a cost function associated with making a decision. Let 
C;,; be the cost of mistaking model 7 for model 7 (¢ ¢ 7) and Cy; the presumably smaller cost of correctly 
choosing model 7: Cj; > Ci, 1 A J. Let 7; be the a priori probability of model 7. The so-called Bayes' 


cost C’ is the average cost of making a decision. 
Equation: 
C= Visti. Ci Prisay Gwhen.Ztrue 
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The Bayes' cost can be expressed as 

Equation: 

C= Mi jtigy Cums PrlR € Z; | Gtrue] 
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To minimize this expression with respect to the decision regions Zp and Z,, ponder which integral would 
yield the smallest value if its integration domain included a specific value of the observation vector. To 


minimize the sum of the two integrals, whichever integrand is smaller should include that value of 7 in its 
integration domain. We conclude that we choose .@ for those values of r yielding a smaller value for the 
first integral. 


ToCoo PRM (7) + ™1Co1 Prim (7) < MoCo PRIM (7) +7™1Cu PRix (7) 


We choose .4, when the inequality is reversed. This expression is easily manipulated to obtain the 
crowning result of detection theory: the likelihood ratio test. 
Equation: 
PR|.M, (7) & mo (Cio — Coo) 
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The comparison relation means selecting model ./, if the left-hand ratio exceeds the value on the right; 
Prim (r) 

PrRi.m(r)’ 
the signal processing performed by the optimal detector on the observations 7. The optimal decision rule 


CoC) * The likelihood ratio test 


can be succinctly expressed as the comparison of the likelihood ratio with a threshold. 
Equation: 


otherwise, .%p is selected. The likelihood ratio symbolically represented by A(r), encapsulates 


then compares that scalar-valued result with a threshold 7 equaling 


MM, 
A(r) 27 
My 


The data processing operations are captured entirely by the likelihood ratio A(r). However, the 
calculations required by the likelihood ratio can be simplified in many cases. Note that only the value of 
the likelihood ratio relative to the threshold matters. Consequently, we can perform any positively 
monotonic transformation simultaneously on the likelihood ratio and the threshold without affecting the 
result of the comparison. For example, we can multiply by a positive constant, add any constant or apply a 
monotonically increasing function to reduce the complexity of the expressions. We single out one such 
function, the logarithm, because it often simplifies likelihood ratios that commonly occur in signal 
processing applications. Known as the log-likelihood, we explicitly express the likelihood ratio test with it 
as 

Equation: 


What simplifying transformations are useful are problem-dependent. But, by laying bare what aspect of the 
observations is essential to the model-testing problem, we reveal the sufficient statistic Y(7): the scalar 
quantity which best summarizes the data for detection purposes. The likelihood ratio test is best expressed 
in terms of the sufficient statistic. 

Equation: 


We denote the threshold value for the sufficient statistic by y or by 7 when the likelihood ratio is used in 
the comparison. 


The likelihood ratio is comprised of the quantities p py _z, (r), which are known as likelihood functions 
and play an important role in estimation theory. It is the likelihood function that portrays the probabilistic 
model describing data generation. The likelihood function completely characterizes the kind of "world" 


assumed by each model. For each model, we must specify the likelihood function so that we can solve the 
hypothesis testing problem. 


A complication, which arises in some cases, is that the sufficient statistic may not be monotonic. If it is 
monotonic, the decision regions Zp and Z; are simply connected: all portions of a region can be reached 
without crossing into the other region. If not, the regions are not simply connected and decision region 
islands are created. Disconnected regions usually complicate calculations of decision performance. 
Monotonic or not, the decision rule proceeds as described: the sufficient statistic is computed for each 
observation vector and compared to a threshold. 


Example: 
The coach of a soccer team suspects his goalie has been less than attentive to his training regimen. The 


coach focuses on the kicks the goalie makes to send the ball down the field. The data r he observes is the 
length of a kick. The coach defines the models as 


¢ .@: not maintaining a training regimen 
¢ @,: is maintaining a training regimen 


The conditional densities---models---of the kick length are shown in [link]. 
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Conditional densities for the distribution of the lengths of soccer 
kicks assuming that the goalie has not attended to his training ( 
Mo) or did (.@%) are shown in the top row. The lower portion 

depicts the likelihood ratio formed from these densities. 


Based on knowledge of soccer player behavior, the coach assigns a priori probabilities of 7» = 1/4 and 
1 = 3/4. The costs C;; are chosen to reflect the coach's sensitivity to the goalies feelings: 

Co1 = 1 = Co (an erroneous decision either way is given the same cost) and Cp) = 0 = Cy. The 
likelihood ratio is plotted in [link] and the threshold value 7, which is computed from the a priori 
probabilities and the costs to be 1/3, is indicated. The calculations of this comparison can be simplified in 
an obvious way. 
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The multiplication by the factor of 50 is a simple illustration of the reduction of the likelihood ratio to a 
sufficient statistic. Based on the assigned costs and a priori probabilities, the optimum decision rule says 
the coach must assume that the student did not train if a kick is less than 16.7; if greater, the goalie is 
assumed to have trained despite producing an abysmally short kick such as 20. Note that as the densities 
given by each model overlap entirely: the possibility of making the wrong interpretation always haunts 
the coach. However, no other procedure will be better (produce a smaller Bayes' cost)! 


Detection Performance Criteria 


The criterion used in the previous section---minimize the average cost of an 
incorrect decision---may seem to be a contrived way of quantifying 
decisions. Well, often it is. For example, the Bayesian decision rule depends 
explicitly on the a priori probabilities. A rational method of assigning 
values to these---either by experiment or through true knowledge of the 
relative likelihood of each model---may be unreasonable. In this section, we 
develop alternative decision rules that try to respond to such objections. 
One essential point will emerge from these considerations: the likelihood 
ratio persists as the core of optimal detectors as optimization criteria 
and problem complexity change. Even criteria remote from performance 
error measures can result in the likelihood ratio test. Such an invariance 
does not occur often in signal processing and underlines the likelihood ratio 
test's importance. 


Maximizing the Probability of a Correct Decision 


As only one model can describe any given set of data (the models are 
mutually exclusive), the probability of being correct P, for distinguishing 
two models is given by 


P. = Prisay @pwhen.Gotrue| + Prisay.A, when. GH true] 


We wish to determine the optimum decision region placement. Expressing 
the probability of being correct in terms of the likelihood functions 
Priv, (7), the a priori probabilities and the decision regions, we have 


P.= fm PR|.M (yar fm PRIM (r) dr 


We want to maximize P, by selecting the decision regions Zp and Z}. 
Mimicking the ideas of the previous section, we associate each value of r 
with the largest integral in the expression for P.. Decision region Zo, for 
example, is defined by the collection of values of r for which the first term 
is largest. As all of the quantities involved are non-negative, the decision 
rule maximizing the probability of a correct decision is 


Note:Given r, choose .@; for which the product 7; prj_g, (7) is largest. 


When we must select among more than two models, this result still applies 
(prove this for yourself). Simple manipulations lead to the likelihood ratio 
test when we must decide between two models. 
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Note that if the Bayes' costs were chosen so that Ci; = 0 and Ci; = C, ( 
i # 7), the Bayes' cost and the maximum-probability-correct thresholds 
would be the same. 


To evaluate the quality of the decision rule, we usually compute the 
probability of error P, rather than the probability of being correct. This 
quantity can be expressed in terms of the observations, the likelihood ratio, 
and the sufficient statistic. 

Equation: 
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These expressions point out that the likelihood ratio and the sufficient 
Statistic can each be considered a function of the observations 7; hence, 
they are random variables and have probability densities for each model. 
When the likelihood ratio is non-monotonic, the first expression is most 
difficult to evaluate. When monotonic, the middle expression often proves 
to be the most difficult. No matter how it is calculated, no other decision 
rule can yield a smaller probability of error. This statement is obvious as 
we minimized the probability of error implicitly by maximizing the 
probability of being correct because P, = 1 — Fi. 


From a grander viewpoint, these expressions represent an achievable lower 
bound on performance (as assessed by the probability of error). 
Furthermore, this probability will be non-zero if the conditional densities 
overlap over some range of values of 7, such as occurred in the previous 
example. Within regions of overlap, the observed values are ambiguous: 
either model is consistent with the observations. Our "optimum" decision 
rule operates in such regions by selecting that model which is most likely 
(has the highest probability) of generating the measured data. 


Neyman-Pearson Criterion 


Situations occur frequently where assigning or measuring the a priori 
probabilities 7; is unreasonable. For example, just what is the a priori 
probability of a supernova occurring in any particular region of the sky? We 
clearly need a model evaluation procedure that can function without a priori 
probabilities. This kind of test results when the so-called Neyman-Pearson 
criterion is used to derive the decision rule. 


Using nomenclature from radar, where model .4 represents the presence 
of a target and .{@ its absence, the various types of correct and incorrect 
decisions have the following names.|footnote |] 


e Detection Probability we say it's there when it is; 
Pp = Prisay.G% | A true] 

e False-alarm Probability we say it's there when it's not; 
Pr = Prisay.G | @otruel 

e Miss Probability we say it's not there when it is; 
Py = Prisay @ | Mtrue| 


The remaining probability Pr[say.Z | Wotrue] has historically been left 
nameless and equals 1 — Pr. We should also note that the detection and 
miss probabilities are related by Pyy = 1 — Pp. As these are conditional 
probabilities, they do not depend on the a priori probabilities. Furthermore, 
the two probabilities Pr and Pp characterize the errors when any decision 
rule is used. 

In statistics, a false-alarm is known as a type I error and a miss a type II 
error. 


These two probabilities are related to each other in an interesting way. 
Expressing these quantities in terms of the decision regions and the 
likelihood functions, we have 


Pr= / Prim (r)dr 


Po= | prin (r)dr 


As the region Z, shrinks, both of these probabilities tend toward zero; as 
Z, expands to engulf the entire range of observation values, they both tend 
toward unity. This rather direct relationship between Pp and Pr does not 
mean that they equal each other; in most cases, as Z; expands, Pp 
increases more rapidly than Pp (we had better be right more often than we 
are wrong!). However, the "ultimate" situation where a rule is always right 
and never wrong (Pp = 1, Pr = 0) cannot occur when the conditional 
distributions overlap. Thus, to increase the detection probability we must 
also allow the false-alarm probability to increase. This behavior 
represents the fundamental tradeoff in detection theory. 


One can attempt to impose a performance criterion that depends only on 
these probabilities with the consequent decision rule not depending on the a 
priori probabilities. The Neyman-Pearson criterion assumes that the false- 
alarm probability is constrained to be less than or equal to a specified value 
a while we maximize the detection probability Pp. 


VP Rel e a a: (max z, {Z:, Pp}) 


A subtlety of the solution we are about to obtain is that the underlying 
probability distribution functions may not be continuous, with the 
consequence that Pr can never equal the constraining value a. 
Furthermore, a (unlikely) possibility is that the optimum value for the false- 
alarm probability is somewhat less than the criterion value. Assume, 
therefore, that we rephrase the optimization problem by requiring that the 
false-alarm probability equal a value a’ that is the largest possible value 
less than or equal to a. 


This optimization problem can be solved using Lagrange multipliers; we 
seek to find the decision rule that maximizes 


F = Pp —X(Ppr-a’) 


where A is a positive Lagrange multiplier. This optimization technique 
amounts to finding the decision rule that maximizes F’, then finding the 
value of the multiplier that allows the criterion toinge the detection 
probability in competition with false-alrm probabilities in excess of the 
criterion value. As is usual in the derivation of optimum decision rules, we 
maximize these quantities with respect to the decision regions. Expressing 
Pp and Pr in terms of them, we have 

Equation: 


F = f pram (r)dr—-A(f pra (r)dr—a') 
= ral + f (prya, (7) -APRm (r)) ar 


To maximize this quantity with respect to Z,, we need only to integrate 
over those regions of r where the integrand is positive). The region Z, thus 
corresponds to those values of r where pry, (r) > A Prim (7) and the 


resulting decision rule is 


The ubiquitous likelihood ratio test again appears; it is indeed the 
fundamental quantity in hypothesis testing. Using either the logarithm of 
the likelihood ratio or the sufficient statistic, this result can be expressed as 
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We have not as yet found a value for the threshold. The false-alarm 
probability can be expressed in terms of the Neyman-Pearson threshold in 
two (useful) ways. 

Equation: 


Pr = Sx” Palm (A)d A 
= [pram (Nar 


One of these implicit equations must be solved for the threshold by setting 
Pr equal to a’. The selection of which to use is usually based on pragmatic 
considerations: the easiest to compute. From the previous discussion of the 
relationship between the detection and false-alarm probabilities, we find 
that to maximize Pp we must allow a’ to be as large as possible while 
remaining less than a. Thus, we want to find the smallest value of A 
consistent with the constraint. Computation of the threshold is problem- 
dependent, but a solution always exists. 


Example: 

An important application of the likelihood ratio test occurs when R is a 
Gaussian random vector for each model. Suppose the models correspond to 
Gaussian random vectors having different mean values but sharing the 
Same covariance. 


¢ Mm: R~ N(0,071) 
© M,:R~ N(m,o71) 


R is of dimension LZ and has statistically independent, equi-variance 
components. The vector of means m = (mp...mz_1)* distinguishes the 
two models. The likelihood functions associated this problem are 
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The likelihood ratio A(r) becomes 
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This expression for the likelihood ratio is complicated. In the Gaussian 
case (and many others), we use the logarithm the reduce the complexity of 
the likelihood ratio and form a sufficient statistic. 

Equation: 


In(A(r)) = Sof 1/2 4 1/2m: 
a7 rey HOE = ser Dy 


The likelihood ratio test then has the much simpler, but equivalent form 


Yom 0? In(n () +12 mt 


To focus on the model evaluation aspects of this problem, let's assume the 
means equal each other and are a positive constant: mj; = m > 0. 
[footnote] We now have 


a? Lm 
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Note that all that need be known about the observations {77} is their sum. 
This quantity is the sufficient statistic 10F the Gaussian problem: 

Y(r) = 3 rj and y = o? In(+ )+2 SS. 
What would happen if the mean were negative? 
When trying to compute the probability of error or the threshold in the 
Neyman-Pearson criterion, we must find the conditional probability 
density of one of the decision statistics: the likelihood ratio, the log- 
likelihood, or the sufficient statistic. The log-likelihood and the sufficient 
Statistic are quite similar in this problem, but clearly we should use the 
latter. One practical property of the sufficient statistic is that it usually 
simplifies computations. For this Gaussian example, the sufficient statistic 
is a Gaussian random variable under each model. 


¢ mM: V(r) ~ N(0, Lo’) 
© M;: V(r) ~ N(Lm, Lo’) 


To find the probability of error from [link], we must evaluate the area 
under a Gaussian probability density function. These integrals are 
succinctly expressed in terms of Q(a), which denotes the probability that a 
unit-variance, zero-mean Gaussian random variable exceeds z. 
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As 1 — Q(x) = Q(—2), the probability of error can be written as 


rene Me) sna 2) 


An Ose special case occurs when 779 = 1/2 = 71. In this case, 
yy = —* and the probability of error becomes 


n= 0( 4) 


As Q(-) is a monotonically decreasing function, the probability of error 
ae However, as shown in 
[link], Q(-) decreases in a nonlinear fashion. Thus, increasing m by a 
factor of two may decrease the probability of error by a larger or a smaller 
factor; the amount of change depends on the initial value of the ratio. 

To find the threshold for the Neyman-Pearson test from the expressions 
given on [link], we need the area under a Gaussian density. 

Equation: 


decreases with increasing values of the ratio 


as a) 


— a! 
As Q(-) is a monotonic and continuous function, we can set a’ equal to the 
criterion value @ with the result 


y= VLoQ"\(a) 


where Q~!(-) denotes the inverse function of Q(-). The solution of this 
equation cannot be performed analytically as no closed form expression 
exists for Q(-) (much less its inverse function). The criterion value must be 
found from tables or numerical routines. Because Gaussian problems arise 
frequently, the accompanying table provides numeric values for this 
quantity at the decade points. 


x Q-'(z) 
107! 1.281 


10-2 2.396 


x Ora) 


10°° 3.090 
10~* 3.719 
10° 4.265 
10° 4.754 


The table displays interesting values for Q~+ (-) that can be used to 
determine thresholds in the Neyman-Pearson variant of the likelihood ratio 
test. Note how little the inverse function changes for decade changes in its 
argument; @Q(-) is indeed very nonlinear. The detection probability of the 
Neyman-Pearson decision rule is given by 
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Detection of Signals in Noise 

Detection theory is specialized to the most common decision problem that 
occurs in signal processing: determining which signal was received in the 
presence of additive noise. 


Far and away the most common decision problem in signal processing is 
determining which of several signals occurs in data contaminated by 
additive noise. Specializing to the case when one of two possible of signals 
is present, the data models are 


° My : R(1) = sg(l) + N(I),0 on ap & 
© ,:Ril)=s,(I)+N(DO<1< DL 


where {s;(2)} denotes the known signals and (1) denotes additive noise 
modeled as a stationary stochastic process. This situation is known as the 
binary detection problem: distinguish between two possible signals 
present in a noisy waveform. 


We form the discrete-time observations into a vector: 
R = (R(0)...R(L — 1))". Now the models become 


. M,:R=s,+N 
© 4@,:R=s8s,+N 


To apply our detection theory results, we need the probability density of R 
under each model. As the only probabilistic component of the observations 
is the noise, the required density for the detection problem is given by 


PR|4, (r) =pn (7 — 8;) 
and the corresponding likelihood ratio by 


= PN (r = 81) 
PN (7 — 80) 


A(r) 


Much of detection theory revolves about interpreting this likelihood ratio 
and deriving the detection threshold. 


Additive White Gaussian Noise 


By far the easiest detection problem to solve occurs when the noise vector 
consists of statistically independent, identically distributed, Gaussian 
random variables, what is commonly termed white Gaussian noise. The 
mean of white noise is usually taken to be zero[footnote] and each 
component's variance is 77. The equal-variance assumption implies the 
noise characteristics are unchanging throughout the entire set of 
observations. The probability density of the noise vector evaluated at 

r — s; equals that of a Gaussian random vector having independent 
components with mean s;. 
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The resulting detection problem is similar to the Gaussian example we 
previously examined, with the difference here being a non-zero mean---the 
signal---under both models. The logarithm of the likelihood ratio becomes 


ih Mi Me 
(r— so) (r—s80) —(r- 81) (r — 81) > 207 In(n) 
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The usual simplifications yield in 


The model-specific components on the left side express the signal 
processing operations for each model.[ footnote | 

The zero-mean assumption is realistic for the detection problem. If the 
mean were non-zero, simply subtracting it from the observed sequence 
results in a zero-mean noise component. 

If more than two signals were assumed possible, quantities such as these 
would need to be computed for each signal and the largest selected. 


Each term in the computations for the optimum detector has a signal 
Pe interpretation. When expanded, the term s;/s; equals 

ip $:2(I), the signal energy E;. The remaining term, ai is the only 
one involving the observations and hence constitutes the sufficient statistic 
Y;(1) for the additive white Gaussian noise detection problem. 


Yi(r) = rs; 


An abstract, but physically relevant, interpretation of this important quantity 
comes from the theory of linear vector spaces. In that context, the quantity 
r? s; would be termed the projection of r onto s;. From the Schwarz 
inequality, we know that the largest value of this projection occurs when 
these vectors are proportional to each other. Thus, a projection measures 
how much alike two vectors are: they are completely alike when they are 
parallel (proportional to each other) and completely dissimilar when 
orthogonal (the projection is zero). In effect, the projection operation 
removes those components from the observations which are orthogonal to 
the signal, thereby generalizing the familiar notion of filtering a signal 
contaminated by broadband noise. In filtering, the signal-to-noise ratio of a 
bandlimited signal can be drastically improved by lowpass filtering; the 
output would consist only of the signal and "in-band" noise. The projection 
serves a Similar role, ideally removing those "out-of-band" components (the 
orthogonal ones) and retaining the "in-band" ones (those parallel to the 
signal). 


Matched Filtering 


The projection operation can be expanded as r7s; = S77") r(1)s;(I) 
another signal processing interpretation emerges. The projection now 
describes a finite impulse response (FIR) filtering operation evaluated at a 
specific index. To demonstrate this interpretation, let h(1) be the unit- 
sample response of a linear, shift-invariant filter where h(l) = 0 forl < 0 
and 1 > L. Letting r(J) be the filter's input sequence, the convolution sum 
expresses the output. 


k 
r(k)*h(k) = YS > r(Dh(k-1) 


1=k—(L-1) 


Letting k = L — 1, the index at which the unit-sample response's last value 
overlaps the input's value at the origin, we have 


r(k)*A(k)|p_r-1 = So r()h L—1-1) 


Suppose we set the unit-sample response equal to the index-reversed, then 
delayed signal. 


h(l) = s;(L —1-1) 


In this case, the filtering operation becomes a projection operation. 


r(k)*si(L—1—k)|,-7-1 = Sr()s 


[link] depicts these computations graphically. 
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The detector for signals contained in additive, white Gaussian noise 
consists of a matched filter, whose output is sampled at the duration of 
the signal and half of the signal energy is subtracted from it. The 
optimum detector incorporates a matched filter for each signal 
compares their outputs to determine the largest. 


The sufficient statistic for the i** signal is thus expressed in signal 
processing notation as r(k)*s;(L —1—k)|,_, _, — =. The filtering term 
is called a matched filter because the observations are passed through a 
filter whose unit-sample response "matches" that of the signal being sought. 
We sample the matched filter's output at the precise moment when all of the 
observations fall within the filter's memory and then adjust this value by 
half the signal energy. The adjusted values for the two assumed signals are 
subtracted and compared to a threshold. 


Detection Performance 


To compute the performance probabilities, the expressions should be 
simplified in the ways discussed in previous sections. As the energy terms 
are known a priori they can be incorporated into the threshold with the 
result 


rt) (81(0) ~ s0(0)) 2 0? In(n) + 25 *2 
1=0 My 


The left term constitutes the sufficient statistic for the binary detection 
problem. Because the additive noise is presumed Gaussian, the sufficient 
Statistic is a Gaussian random variable no matter which model is assumed. 
Under .@;, the specifics of this probability distribution are 


GO = sine 
[= 


where the mean and variance of the Gaussian distribution are given 
respectively by 


mi = > si(l) (81(2) — so(2)) 


var; =o" > (s1(1) — so(l))? 


Note that the variance does not depend on model. The false-alarm 
probability is given by 


o7 In(n) + = — m9 
Pr=Q [See 
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The signal-related terms in the numerator of this expression can be 
manipulated so that the false-alarm probability of the optimal white 
Gaussian noise detector is succinctly expressed by 


In(n) + gor © (8108) = s0(0))" 
1 (5 (si(1) — s0(0))?)” 


Note that the only signal-related quantity affecting this performance 
probability (and all of the others as well) is the ratio of the energy in the 
difference signal to the noise variance. The larger this ratio, the better 
(i.e., smaller) the performance probabilities become. Note that the details of 
the signal waveforms do not greatly affect the energy of the difference 
signal. For example, consider the case where the two signal energies are 
equal (Ky = E, = E); the energy of the difference signal is given by 

2E' — 25> so(l)s1(l). The largest value of this energy occurs when the 
signals are negatives of each other, with the difference-signal energy 
equaling 4. Thus, equal-energy but opposite-signed signals such as sine 
waves, square-waves, Bessel functions, etc. all yield exactly the same 
performance levels. The essential signal properties that do yield good 
performance values are elucidated by an alternate interpretation. The term 
S~ (s1(L) — so(1))* equals (|| s1 — so ||)”, the L? norm of the difference 
signal. Geometrically, the difference-signal energy is the same quantity as 
the square of the Euclidean distance between the two signals. In these 
terms, a larger distance between the two signals means better performance. 


Example: 

Detection, Gaussian example 

A common detection problem is to determine whether a signal is present ( 
MM) or not (4). To model the latter case, the signal equals zero: 

so(l) = 0. The optimal detector relies on filtering the data with a matched 
filter having a unit-sample response based on the signal that might be 
present. Letting the signal under .Z, be denoted simply by s(J), the 
optimal detector consists of 
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The false-alarm and detection probabilities are given by 


a 
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[link] displays the probability of detection as a function of the signal-to- 
noise ratio = for several values of false-alarm probability. Given an 


estimate of the expected signal-to-noise ratio, these curves can be used to 
assess the trade-off between the false-alarm and detection probabilities. 


Pr=Q 


Probability of Detection 
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The probability of detection is plotted versus signal-to-noise ratio for 
various values of the false-alarm probability Pr. False-alarm 
probabilities range from 107! down to 10° by decades. The 
matched filter receiver was used since the noise is white and 

Gaussian. Note how the range of signal-to-noise ratios over which the 

detection probability changes shrinks as the false-alarm probability 
decreases. This effect is a consequence of the non-linear nature of the 
function Q(.). 


The important parameter determining detector performance derived in this 
example is the signal-to-noise ratio = the larger it is, the smaller the 
false-alarm probability is (generally speaking). Signal-to-noise ratios can be 
measured in many different ways. For example, one measure might be the 


ratio of the rms signal amplitude to the rms noise amplitude. Note that the 


important one for the detection problem is much different. The signal 
portion is the sum of the squared signal values over the entire set of 
observed values - the signal energy; the noise portion is the variance of 
each noise component - the noise power. Thus, energy can be increased in 
two ways that increase the signal-to-noise ratio: the signal can be made 
larger or the observations can be extended to encompass a larger number of 
values. 


To illustrate this point, how a matched filter operates is shown in [link]. The 
signal is very difficult to discern in the presence of noise. However, the 
signal-to-noise ratio that determines detection performance belies the eye. 
The matched filter output demonstrates an amazingly clean signal. 
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The signal consists of ten cycles of 
sin(wol) with wo = 270.1. The middle 
panel shows the signal with noise added. 
The lower portion depicts the matched- 
filter output. The detection threshold was 


set for a false-alarm probability of 10~?. 
Even though the matched filter output 
crosses the threshold several times, only the 
output at ] = L — 1 matters. For this 
example, it coincides with the peak output 
of the matched filter. 


Beyond the Basics 


For more... 


Many problems in statistical signal processing and communications can be 
solved using basic detection theory. For example, determining whether an 
airplane is located at a specific range and direction with radar and whether a 
received bit is a 0 or a 1 are both solved with matched filter detectors. 
However, many elaborations of deciding which of two models best 
describes a given dataset abound. For instance, suppose we have more than 
two models for the data. Previous modules hint at how to expand beyond 
two models (see [link] and [link]). However, no extensions for Neyman- 
Pearson detectors for more than two models exists. 


To learn more about the basics of detection theory and beyond, see the 
books by Van Trees, Kay and McDonough and Whalen, and several 
modules on Connexions (search for ‘likelihood ratio’, 'detection theory’ and 
‘matched filter’). The Wikipedia article on Statistical Hypothesis Testing 
describes what is called detection theory here more abstractly from a 
Statistician's viewpoint. 


Beyond Simple Problems 


More interesting (and challenging) are situations where the data models are 
imprecise to some degree. The simplest case is when some model parameter 
is not known. For example, suppose the exact time of the radar return is not 
known (i.e., the airplane's range is uncertain). Here, the unknown parameter 
is the signal's time-of-origin. We must somehow determine that parameter 
and determine if the signal is actually present. 


As you might expect, the likelihood ratio remains the focus of attention, 
now in the guise of what is known as the generalized likelihood ratio test 
(GLRT) (see this Connexions module).[ footnote] This technique and others 
opens the door to what are known as simultaneous estimation and 
detection algorithms. 

The Wikipedia article on the Likelihood-ratio test is concerned with the 
Generalized Likelihood Ratio Test. 


Some unknowns may not be parametric and prevent a precise description of 
a model by a probability function. What do we do when the amplitude 
distribution function of the additive noise is not well characterized? So- 
called robust detection represents one attempt to address these problems. 
See [link] for more. 


Beyond variations of model uncertainties are new approaches to solving 
detection problems. For example, a subtlety in the basic formulation is that 
all the data are available. Can we do just as well by attempting to make a 
decision prematurely as the data arrive? Note that always taking a fixed 
fewer number of samples always leads to worse performance. Instead, we 
acquire only the amount of data needed to make a decision. This approach 
is known as sequential detection or the sequential probability ratio test 
(SPRT). See modules in Connexions and the classic book by Wald. 


